Website Social Links Scraper avatar

Website Social Links Scraper

Pricing

from $2.00 / 1,000 social profile extracteds

Go to Apify Store
Website Social Links Scraper

Website Social Links Scraper

Blazing fast scraper to extract social media links (Facebook, Twitter, LinkedIn, Instagram, etc.) from websites. Includes advanced filtering to avoid share links and deduplication by domain.

Pricing

from $2.00 / 1,000 social profile extracteds

Rating

5.0

(1)

Developer

CodeScraper

CodeScraper

Maintained by Community

Actor stats

1

Bookmarked

4

Total users

3

Monthly active users

8 hours ago

Last modified

Share

๐ŸŒ Website Social Links Scraper โ€“ High-Speed Profile Extractor

This Apify actor extracts social media profiles and community links directly from websites with high accuracy and intelligent validation.

It combines Cheerio-based HTML scraping, advanced regex matching, smart brand-name filtering, internal link discovery, and domain-level aggregation to identify, validate, deduplicate, and organize social profiles while filtering out share links, tracking URLs, irrelevant external pages, and false positives.


๐Ÿš€ What It Does

For every website URL provided, the actor extracts:

๐Ÿข Website Overview

  • ๐ŸŒ Input URL
  • ๐Ÿ”— Target Domain
  • ๐Ÿท๏ธ Brand Name
  • ๐Ÿ“Š Pages Scanned
  • ๐Ÿงฎ Total Profiles Found

๐Ÿ“ฑ Social Profile Data

For each social platform found:

  • ๐Ÿฆ Twitter / X
  • ๐Ÿ’ผ LinkedIn
  • ๐Ÿ“ธ Instagram
  • ๐Ÿ“˜ Facebook
  • ๐ŸŽฅ YouTube
  • ๐ŸŽต TikTok
  • ๐Ÿ™ GitHub
  • ๐Ÿ‘พ Discord
  • ๐Ÿ‘ฝ Reddit
  • ๐Ÿ“Œ Pinterest

โšก It Handles

  • โœ… Multiple website URLs and raw domains
  • ๐Ÿ”„ Automatic URL normalization
  • โšก High-speed raw HTML scraping
  • ๐Ÿ›ก๏ธ Anti-blocking capabilities
  • ๐Ÿง  Smart Brand-Name Filtering
  • ๐Ÿšซ Share-link and intent-link filtering
  • ๐Ÿงน Domain-level deduplication
  • ๐Ÿ”— Same-hostname internal crawling
  • ๐Ÿ“Š Cross-page profile aggregation
  • ๐ŸŽฏ Platform-specific validation

๐Ÿง  How It Works

  1. Loads website URLs and normalizes domains

  2. Extracts the base domain and brand name

  3. Fetches raw HTML using CheerioCrawler with optimized request handling

  4. Scans:

    • HTML content
    • Anchor tags
    • Embedded JSON data
    • Next.js hydration data
    • Structured metadata
  5. Uses advanced regular expressions to identify social profile URLs

  6. Filters:

    • Share links
    • Intent endpoints
    • Tracking parameters
    • Non-profile URLs
  7. Applies Smart Brand-Name Validation to ensure profiles belong to the target website

  8. Discovers and crawls internal pages such as:

    • Contact pages
    • About pages
    • Team pages
    • Community pages
  9. Aggregates and deduplicates profiles across all crawled pages

  10. Saves structured results to the Apify Dataset


โš™๏ธ Input Configuration

FieldTypeDescriptionDefault
startUrlsArrayList of website URLs or domains to scrape[]
platformsArraySocial platforms to extract["facebook","twitter","linkedin"]
maxPagesPerDomainIntegerMaximum pages to crawl per domain10
proxyConfigurationObjectProxy settings for anti-blocking{"useApifyProxy":true}

๐Ÿงฉ Example Input

{
"startUrls": [
"apify.com"
],
"platforms": [
"twitter",
"linkedin",
"youtube",
"tiktok",
"github",
"discord"
],
"maxPagesPerDomain": 5
}

๐Ÿ“Š Example Output

{
"inputUrl": "https://apify.com",
"targetDomain": "apify.com",
"brandName": "apify",
"statistics": {
"pagesScanned": 5,
"totalProfilesFound": 8
},
"socialProfiles": {
"twitter": [
"https://x.com/apify"
],
"linkedin": [
"https://linkedin.com/company/apify"
],
"youtube": [
"https://www.youtube.com/apify"
],
"tiktok": [
"https://www.tiktok.com/@apifytech"
],
"github": [
"https://github.com/apify"
],
"discord": [
"https://discord.com/invite/jyEM2PRvMU",
"https://discord.gg/w3e2v7rWDw"
]
}
}

If no social profiles are found:

{
"inputUrl": "https://example.com",
"targetDomain": "example.com",
"brandName": "example",
"statistics": {
"pagesScanned": 2,
"totalProfilesFound": 0
},
"socialProfiles": {}
}

โŒ Error Handling

If a website cannot be accessed or processed:

{
"inputUrl": "https://example.com",
"targetDomain": "example.com",
"status": "Failed",
"error": "Request failed completely (check proxy or rate limits)",
"statistics": {
"pagesScanned": 0,
"totalProfilesFound": 0
},
"socialProfiles": {}
}

๐Ÿง  Features

  • ๐ŸŒ Accurate social profile extraction
  • โšก High-speed HTML-based scraping
  • ๐Ÿ›ก๏ธ Built-in anti-blocking mechanisms
  • ๐Ÿง  Smart brand-name validation
  • ๐Ÿ”— Automatic internal page discovery
  • ๐Ÿงน Cross-page deduplication
  • ๐Ÿ“Š Domain-level aggregation
  • ๐ŸŽฏ Platform-specific filtering
  • ๐Ÿšซ Share-link elimination
  • ๐Ÿท๏ธ Structured social profile organization

๐Ÿ’ก Use Cases

  • B2B Lead Generation
  • CRM Data Enrichment
  • Influencer Discovery
  • Brand Intelligence Research
  • Competitor Analysis
  • Sales Prospecting
  • Community Discovery
  • Cold Outreach Campaigns

โ“ FAQs

1. Why is it so fast?

The actor downloads and processes raw HTML directly instead of launching a full browser session. Combined with CheerioCrawler and optimized parsing, it delivers significantly higher throughput while minimizing resource consumption.


2. Will it work on JavaScript-heavy websites?

In many cases, yes. Modern frameworks such as Next.js and Nuxt often embed their application state directly into HTML. Since the scraper scans the entire HTML response, including embedded JSON data, it can frequently extract social profiles without rendering JavaScript.


3. How does Smart Brand-Name Filtering work?

The actor extracts the website's brand name from its domain and validates discovered social profile handles against that brand. This prevents unrelated user profiles from being incorrectly associated with the target website.


Discord invite URLs typically contain randomized invite codes rather than brand names. To avoid missing legitimate community links, Discord URLs are exempt from strict brand-name validation.


๐Ÿง‘โ€๐Ÿ’ป Developer Info

Author: codescraper

Email: codescraper011@gmail.com


๐Ÿท๏ธ Tags

social-scraper ยท social-links-extractor ยท lead-generation ยท data-enrichment ยท osint ยท b2b-data ยท contact-extractor ยท web-scraping